1. While there are certainly issues with this image, do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
The graph seems to show a plot of proportions of people who believe vaccines are safe per country. And each the plot seems to be separated by region, with certain countries being labeled. For each plot, there is a line within the plots, showing the median per each region.
2. List the variables that appear to be displayed in this visualization.
The variables in this visualization are countries, region of that country, and % of people who believe vaccines are safe per country. On the x-axis, there is the % of the country’s population that believes vaccines are safe. On the y-axis, it appears to be the index of the country in that region with 1 being the smallest % of believing vaccines are safe.
3. Now that you’re versed in the grammar of graphics (ggplot), list the aesthetics used and which variables are specified for each.
The different regions are separated by the color of the points. Some of the countries are labeled. And the median of % of people who believe vaccines are safe per region is represented by a line within the plots.
4. What type of graph would you call this?
This wouldn’t really be a scatter plot, as scatter plots take quantitative variables for both x and y. I would call this a segmented bar plot with points instead of bars. Each point represents a quantitative value for each country, which is what a bar plot should do. An argument could also be made that this visualization is multiple box plots, where we are creating box plots for each region. As the median is represented like a boxplot, and we have a decent tell of the spread. The only issue is, there are no boxes.
5. List all of the problems or things you would improve about this graph.
library(tidyverse)
library(readxl)
library(ggthemes)
library(tidygeocoder)
library(leaflet)
library(readxl)
library(gganimate)
df_world <- read_xlsx(here::here("wgm2018-dataset-crosstabs-all-countries.xlsx"), sheet = 2)
head(df_world)
## # A tibble: 6 × 60
## WP5 wgt PROJWT FIELD_DATE YEAR_C…¹ Q1 Q2 Q3 Q4 Q5A
## <dbl> <dbl> <dbl> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.653 171770. 2018-01-08 00:00:00 2018 3 2 1 2 2
## 2 1 0.696 183053. 2018-01-08 00:00:00 2018 2 2 1 2 1
## 3 1 0.524 137829. 2018-01-08 00:00:00 2018 2 2 1 98 1
## 4 1 0.764 201139. 2018-01-08 00:00:00 2018 2 1 1 2 1
## 5 1 3.33 875646. 2018-01-08 00:00:00 2018 2 1 1 2 1
## 6 1 0.371 97672. 2018-01-08 00:00:00 2018 1 1 1 2 1
## # … with 50 more variables: Q5B <dbl>, Q5C <dbl>, Q6 <dbl>, Q7 <dbl>, Q8 <dbl>,
## # Q9 <dbl>, Q10A <dbl>, Q10B <dbl>, Q11A <dbl>, Q11B <dbl>, Q11C <dbl>,
## # Q11D <dbl>, Q11E <dbl>, Q11F <dbl>, Q11G <dbl>, Q12 <dbl>, Q13 <dbl>,
## # Q14A <dbl>, Q14B <dbl>, Q15A <dbl>, Q15B <dbl>, Q16 <dbl>, Q17 <dbl>,
## # Q18 <dbl>, Q19 <dbl>, Q20 <dbl>, Q21 <dbl>, Q22 <dbl>, Q23 <dbl>,
## # Q24 <dbl>, Q25 <dbl>, Q26 <dbl>, Q27 <dbl>, Q28 <dbl>, D1 <dbl>, Q29 <dbl>,
## # Q30 <dbl>, WGM_Index <dbl>, WGM_Indexr <dbl>, ViewOfScience <dbl>, …
plt <- df_world %>%
filter(Q23 == 1) %>%
group_by(WP5) %>%
summarize(vacc_agree = sum(Q25 %in% c(1, 2)),
total = n(),
Regions_Report) %>%
filter(row_number() == 1,
Regions_Report != 0) %>%
mutate(prop = vacc_agree/total * 100,
region = case_when(
Regions_Report %in% c(1, 2, 4, 5) ~ "Sub-Saharan Africa",
Regions_Report %in% c(6:8) ~ "Americas",
Regions_Report %in% c(9:12) ~ "Asia",
Regions_Report %in% c(3, 13) ~ "Middle East and North Africa",
Regions_Report %in% c(14:18) ~ "Europe and Oceania")
)
ggplot(plt, aes(x = prop, y = region, fill = region)) +
geom_boxplot() +
geom_text(subset(plt, WP5 == 29),
mapping = aes(label = "Japan"),
vjust = 2,
size = 3) +
labs(title = "Boxplots of % of people who believe \nvaccines are safe, \nby global region",
x = "Percentages",
y = "") +
theme_economist() +
theme(legend.position = "none")
1. Select a data visualization in the report that you think could be improved. Be sure to cite both the page number and figure title. Do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
The plot that I think could be improved is th eon titled “People seeking science information by region and internet access” on page 37. The graph is trying to show the proportion of citizens in each country surveyed that has responded yes to whether they have internet access, sought science information and have internet access, or sought science information and does not have internet access. It seems the authors meant to highlight the difference in internet access and willingness to search the internet from country to country.
2. List the variables that appear to be displayed in this visualization.
The variables that are displayed are country, and percentage of citizens within each country that answered yes for each of the three respective questions.
3. Now that you’re versed in the grammar of graphics (ggplot), list the aesthetics used and which variables are specified for each.
x = percentage, group = question, fill = question
4. What type of graph would you call this?
Bar plot
5. List all of the problems or things you would improve about this graph. It is hard to see which country is overall better or has the highest percentage for each question at first glance.
6. Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.
library(tidyverse)
library(here)
library(gganimate)
questions <- data.frame(country = rep(c("World", "Eastern Africa", "Central Africa", "North Africa", "Southern Africa", "Western Africa", "Central America & Mexico", "Northern America", "South America", "Central Asia", "East Asia", "Southeast Asia", "South Asia", "Middle East", "Eastern Europe", "Northern Europe", "Southern Europe", "Western Europe", "AUS & NZ"), each = 3),
question = rep(c("Has internet access", "Sought health info & has internet access", "Sought health info & does not have internet access"), times = 19),
proportion = c(.54,.52,.28,.24,.56,.35,.27,.60,.40,.48,.47,.23,.46,.43,.34,.34,.46,.29,.55,.52,.34,.91,.74,.38,.62,.51,.33,.55,.42,.34,.67,.51,.3,.46,.46,.27,.25,.35,.22,.72,.58,.38,.74,.47,.33,.91,.59,.38,.82,.61,.47,.9,.63,.51,.93,.66,.31))
questions <- questions %>%
mutate(rank = rank(-proportion, ties.method = 'random'))
p.static <- ggplot(questions, aes(rank, group = country,
fill = as.factor(country), color = as.factor(country))) +
geom_tile(aes(y = proportion/2,
height = proportion,
width = .9), alpha = 0.8, color = NA) +
geom_text(aes(y = 0, label = paste(country, " ")), vjust = 0.2, hjust = 1, size = 6) +
geom_text(aes(y = proportion, label = proportion, hjust = 0), size = 6) +
coord_flip(clip = "off", expand = FALSE) +
scale_y_continuous(labels = scales::comma) +
scale_x_reverse() +
guides(color = FALSE, fill = FALSE) +
theme(axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
legend.position = "none",
panel.background = element_blank(),
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_line(size = 0.1, color = "black"),
panel.grid.minor.x = element_line(size = 0.1, color = "black"),
plot.title = element_text(size = 25),
plot.subtitle = element_text(size=18, face = "italic", color = "black", hjust = 0.5, vjust = -1),
plot.background = element_blank(),
plot.margin = margin(2, 2, 2, 6, "cm"))
anim <- p.static +
transition_states(question, transition_length = 4, state_length = 4) +
labs(title = 'Proportion of citizens that answered yes to: {closest_state}')
animate(anim, 200, fps =5, width = 1200, height = 1000)
### Lab 2 Part 2 - Third Data Visualization Improvement
Chart 4.5: The combined view of how people feel about the benefits of science on a personal and country level, Page 84
This chart is describing the four analyzable outcomes between one’s views on the extent to which science benefits society and on the extent to which sciecne benefits people normally. The four outcomes are: The Excluded, The Enthusiasts, Skeptics, and The Included. The authors are using proportionately sized circles within these four categorize to in attempt to visually describe which of these four opinions are more popular than the others.
2 Categorical Variables:
The results are then displayed as a percent of all of those surveyed.
This graph visually looks like a simple correlogram/bubble plot however represents density of the respondents who answered a certain way as opposed to correlative relationship between numeric variables. Functionally, the graph matches a treemap the most with the unsure/chose not to answer respondents removed.
countries <- c("United States", "Egypt", "Morocco", "Lebanon", "Saudi Arabia", "Jordan", "Turkey", "Pakistan", "Indonesia", "Bangladesh", "United Kingdom", "France", "Germany", "Netherlands", "Belgium", "Spain", "Italy", "Poland", "Hungary", "Czech Republic", "Romania", "Sweden", "Greece", "Denmark", "Iran", "Singapore", "Japan", "China", "India", "Venezuela", "Brazil", "Mexico", "Nigeria", "Kenya", "Tanzania", "Israel", "Palestinian Territories", "Ghana", "Uganda", "Benin", "Madagascar", "Malawi", "South Africa", "Canada", "Australia", "Philippines", "Sri Lanka", "Vietnam", "Thailand", "Cambodia", "Laos", "Myanmar", "New Zealand", "Botswana", "Ethiopia", "Mali", "Mauritania", "Mozambique", "Niger", "Rwanda", "Senegal", "Zambia", "South Korea", "Taiwan", "Afghanistan", "Belarus", "Georgia", "Kazakhstan", "Kyrgyzstan", "Moldova", "Russia", "Ukraine", "Burkina Faso", "Cameroon", "Sierra Leone", "Zimbabwe", "Costa Rica", "Albania", "Algeria", "Argentina", "Armenia", "Austria", "Azerbaijan", "Bolivia", "Bosnia and Herzegovina", "Bulgaria", "Burundi", "Chad", "Chile", "Colombia", "Comoros", "Republic of Congo", "Croatia", "Cyprus", "Dominican Republic", "Ecuador", "El Salvador", "Estonia", "Finland", "Gabon", "Guatemala", "Guinea", "Haiti", "Honduras", "Iceland", "Iraq", "Ireland", "Ivory Coast", "Kuwait", "Latvia", "Liberia", "Libya", "Lithuania", "Luxembourg", "Macedonia", "Malaysia", "Malta", "Mauritius", "Mongolia", "Montenegro", "Namibia", "Nepal", "Nicaragua", "Norway", "Panama", "Paraguay", "Peru", "Portugal", "Serbia", "Slovakia", "Slovenia", "Eswatini", "Switzerland", "Tajikistan", "The Gambia", "Togo", "Tunisia", "Turkmenistan", "United Arab Emirates", "Uruguay", "Uzbekistan", "Yemen", "Kosovo", "Northern Cyprus")
#Chat GPT took the list from excel and made it a comma separated list, all hail Chat GPT
data_og <- read_xlsx(here::here("wgm2018-dataset-crosstabs-all-countries.xlsx"), sheet = 2)
#Question 1: In general, do you think the work that scientists do benefits most, some, or very few people in this country?
#Question 2: In general, do you think the work that scientists do benefits people like you in this country?
data_clean <- data_og |> #Changing the Q1 and Q2 values
select(WP5, Q16, Q17) |>
mutate(Q16 = case_when(
Q16 == 1 ~ "Most",
Q16 == 2 ~ "Some",
Q16 == 3 ~ "Very Few",
Q16 == 98 ~ "DK",
Q16 == 99 ~ "R")
) |>
mutate(Q17 = case_when(
Q17 == 1 ~ "Yes",
Q17 == 2 ~ "No",
Q17 == 98 ~ "DK",
Q17 == 99 ~ "R")
) |>
filter(Q16 %in% c("Most", "Some", "Very Few"),
Q17 %in% c("Yes", "No")) |>
rename("Country" = WP5, "Q1" = Q16, "Q2" = Q17) |>
mutate(`Country` = countries[`Country`])
data_clean <- data_clean |> #importing country names and adding the combo column, case_when() wasn't working, sorry for ifelse spam
mutate (Combo = ifelse(`Q1` %in% "Most" & `Q2` %in% "Yes", "Benefits Them & Most", NA),
Combo = ifelse(`Q1` %in% "Very Few" & `Q2` %in% "Yes", "Benefits Them But Not Most", Combo),
Combo = ifelse((`Q1` %in% "Some" | `Q1` %in% "Most") & `Q2` %in% "No", "Doesn't Benefit Them But Benefits Most", Combo),
Combo = ifelse(`Q1` %in% "Very Few" & `Q2` %in% "No", "Doesn't Benefit Them Nor Most", Combo)) |>
na.omit()
data_clean_geo <- data_clean |> #Getting Geocode data for each country
geocode(country = `Country`) |>
na.omit()
leaflet() |>
addTiles() |>
addCircles(lng = data_clean_geo$long,
lat = data_clean_geo$lat,
radius = length(data_clean_geo$Combo[data_clean_geo$Combo %in% "Benefits Them & Most"]),
popup = data_clean_geo$Combo[data_clean_geo$Combo %in% "Benefits Them & Most"],
color = "#FFCD27",
opacity = .5) |>
addCircles(lng = data_clean_geo$long,
lat = data_clean_geo$lat,
radius = length(data_clean_geo$Combo[data_clean_geo$Combo %in% "Benefits Them But Not Most"]),
popup = data_clean_geo$Combo[data_clean_geo$Combo %in% "Benefits Them But Not Most"],
color = "#61933F",
opacity = .5) |>
addCircles(lng = data_clean_geo$long,
lat = data_clean_geo$lat,
radius = length(data_clean_geo$Combo[data_clean_geo$Combo %in% "Doesn't Benefit Them But Benefits Most"]),
popup = data_clean_geo$Combo[data_clean_geo$Combo %in% "Doesn't Benefit Them But Benefits Most"],
color = "#2FC1D2",
opacity = .5) |>
addCircles(lng = data_clean_geo$long,
lat = data_clean_geo$lat,
radius = length(data_clean_geo$Combo[data_clean_geo$Combo %in% "Doesn't Benefit Them Nor Most"]) ,
popup = data_clean_geo$Combo[data_clean_geo$Combo %in% "Doesn't Benefit Them Nor Most"],
color = "#003D57",
opacity = .5) |>
addLegend(position = "topright", title = "The combined view of how people feel about the benefits
of science on a personal and country level per country", labels = c("Benefits Them & Most", "Benefits Them But Not Most", "Doesn't Benefit Them But Benefits Most", "Doesn't Benefit Them Nor Most"), colors = c("#FFCD27", "#61933F", "#2FC1D2", "#003D57"))